Extracting Structural Paraphrases from Aligned Monolingual Corpora

نویسندگان

  • Ali Ibrahim
  • Boris Katz
  • Jimmy Lin
چکیده

We present an approach for automatically learning paraphrases from aligned monolingual corpora. Our algorithm works by generalizing the syntactic paths between corresponding anchors in aligned sentence pairs. Compared to previous work, structural paraphrases generated by our algorithm tend to be much longer on average, and are capable of capturing long-distance dependencies. In addition to a standalone evaluation of our paraphrases, we also describe a question answering application currently under development that could immensely benefit from automatically-learned structural paraphrases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Lay Paraphrases of Specialized Expressions from Monolingual Comparable Medical Corpora

Whereas multilingual comparable corpora have been used to identify translations of words or terms, monolingual corpora can help identify paraphrases. The present work addresses paraphrases found between two different discourse types: specialized and lay texts. We therefore built comparable corpora of specialized and lay texts in order to detect equivalent lay and specialized expressions. We ide...

متن کامل

Unsupervised Learning of Paraphrases

Paraphrasing constitutes a corner stone in many Natural Language Processing fields like monolingual text-to-text generation and automatic text summarization. Indeed, aligned monolingual corpora are likely to boost the learning process of text-to-text generation models. A Paraphrase learning strategy can be defined as a two-step process: (1) identifying and extracting related sentence pairs from...

متن کامل

Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation

Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate resource for extracting more sophisticated sentential paraphrases, which are more obviously learnable from monolingual parallel corpora. We extend bilingual paraphrase extraction to syntactic paraphrases and demonstrate its a...

متن کامل

Enlarging Paraphrase Collections through Generalization and Instantiation

This paper presents a paraphrase acquisition method that uncovers and exploits generalities underlying paraphrases: paraphrase patterns are first induced and then used to collect novel instances. Unlike existing methods, ours uses both bilingual parallel and monolingual corpora. While the former are regarded as a source of high-quality seed paraphrases, the latter are searched for paraphrases t...

متن کامل

Extracting Paraphrases from Aligned Corpora

The Problem: The expressiveness of human language allows people to express the same idea in many different ways; they may use different words to refer to the same entity or employ different phrases to describe the same concept. Thus, an effective information retrieval (IR) and question answering (QA) system must be equipped to handle these variations, both when processing documents and when fie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003